Report for answering 2nd Part of Capstone Project (Full Report)

Capstone Project - Week 2 (Full Report)

Table of Contents

Introduction: Business Problem

This project attempts to examine a city (NYC) to find an optimal neighborhood for opening a mexican restaurant.

New York City(pre-COVID) is a city bustling with restaurants and pedestrian traffic. The code will examine data to find a location that is based on the following criteria:

  • A new restaurant prefers to be opened near other restaurants. (Restaurants prefer to be clustered together rather than isolated)
  • Restaurants prefer to be near a strong customer "draw" (E.g. a School, Office District, major venue: stadium, theatre)
  • Restaurants prefer to be the only restuarant of its type in neighborhood (e.g. A mexican restaurant would not want to be near another mexican restaurant.)

Data

This where you will type up your discussion of the data e en before you get into the code!! Based on the criteria discussed in the introduction, the data collected would be a datasets consisting of all venues in the NYC area, broken down into smaller manageable datasets.

The data will be collected using the FourSquare API. All coordinates would be looked up and pandas would be created based on data returned by API. Additionally, maps and other visualization tools would be utilize to assit in making a decision on the best locations for opening a Mexican Restaurant. (All the code for this project is located after this report)

There would be an initial grab of all the venues and this would be filtered out into relevant subsets.

Methodology

The newyork_data.json file is used as a starting source for this project instead of the google maps API. This json data is then loaded into a dataframe. Initially, it contains information on all the neighborhoods in all the NY boroughs. This data is filtered down to focus only on Manhattan neighborhoods (with their respective coordinates.

A pre-written function is used to iterate through the filtered manhattan_data dataset whilst making calls to the FourSquare API to generate a new dataset (manhattan_venues) which contains all the venues located in the FourSquare dataset for Manhattan.

This data is the manipulated and cleaned to produce subsets of data that includes only restaurant as the category and also more specifically only MEXICAN restaurants. These datasets would then serve as our primary data source for visualizing and making decisions about possible good locations to open a Mexican Restaurant in Manhattan.

Results

The following images are generated as part of the visualization of data collected.

Image showing Restaurants in Manhattan (Map 1)

Blue dots are Mexican Restaurants. Red Dots are non-Mexican Restaurants image.png

The following image shows the concentration of Mexican restaurants. (Map 2)

image.png

The following image shows the KMeans clustering data. (Map 3)

image.png

Discussion

Here is your discussion Cell. Please make the most of it. Based on preliminary data, we can see that an ideal location for opening a Mexican Restaurant may be in the areas where they are notably absent, specifically, the lower West Side and West Mid-town (See Map 2).

Conclusion

Here is your conclusion CELL for discussion


-------------------- Code Section --------------------

This Notebook will use data to suggest the best location for opening a Mexican restaurant in NY city

------This is a completely fictional scenario ------

--------It assumes the following--------

     --------Restaurants (owners) prefer to be located near to other restaurants --------
     --------Restaurants (owners) prefer to be near a potential market (School, Office District)
     --------Restaurants (owners) prefer not to be near immediate competitors (e.g. A Mexican restaurant does not
             want to be located near another Mexican restaurant, but would prefer being close to Italian, 
             Ethopian etc.)
In [2]:
pip install geopy
Collecting geopy
  Downloading https://files.pythonhosted.org/packages/07/e1/9c72de674d5c2b8fcb0738a5ceeb5424941fefa080bfe4e240d0bacb5a38/geopy-2.0.0-py3-none-any.whl (111kB)
     |████████████████████████████████| 112kB 24.2MB/s eta 0:00:01
Collecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.0.0
Note: you may need to restart the kernel to use updated packages.
In [1]:
#Import all required libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
#newyork_data
In [3]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
In [4]:
neighborhoods.head()
Out[4]:
Borough Neighborhood Latitude Longitude
0 Bronx Wakefield 40.894705 -73.847201
1 Bronx Co-op City 40.874294 -73.829939
2 Bronx Eastchester 40.887556 -73.827806
3 Bronx Fieldston 40.895437 -73.905643
4 Bronx Riverdale 40.890834 -73.912585
In [5]:
#Focus is on Manhattan for potential new restaurant...
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()
Out[5]:
Borough Neighborhood Latitude Longitude
0 Manhattan Marble Hill 40.876551 -73.910660
1 Manhattan Chinatown 40.715618 -73.994279
2 Manhattan Washington Heights 40.851903 -73.936900
3 Manhattan Inwood 40.867684 -73.921210
4 Manhattan Hamilton Heights 40.823604 -73.949688

Use data pulled from FourSquare to locate as many restaurant venues as possible while paying attention to the locations of MEXICAN Restaurant

The following cell is hidden but it contains foursquare credentials

In [6]:
# @hidden_cell
CLIENT_ID = 'V4T0HDCPWZVSPMMHVTNJAT3QORC2H10MKVJEM44ZRDTY2XKW' # your Foursquare ID
CLIENT_SECRET = 'KXYRVUGUUAMPJTPGNSQD31202ZB0ZPSQGHUKZVKWGHRK5KB2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

LIMIT = 100
radius = 750
In [7]:
# This function is for getting all venues in specific coordinates
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
In [8]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards
In [9]:
manhattan_venues.tail(15)
Out[9]:
Neighborhood Neighborhood Latitude Neighborhood Longitude Venue Venue Latitude Venue Longitude Venue Category
3204 Hudson Yards 40.756658 -74.000111 Spanish Diner 40.752394 -74.001491 Spanish Restaurant
3205 Hudson Yards 40.756658 -74.000111 Romeo and Juliet Coffee 40.760726 -73.997724 Coffee Shop
3206 Hudson Yards 40.756658 -74.000111 Il Punto Ristorante 40.756079 -73.994594 Italian Restaurant
3207 Hudson Yards 40.756658 -74.000111 Rosewood Theatre 40.757999 -73.999582 Nightclub
3208 Hudson Yards 40.756658 -74.000111 Thai Select 40.754867 -73.995007 Thai Restaurant
3209 Hudson Yards 40.756658 -74.000111 Silver Towers Dog Run 40.760854 -73.999765 Dog Run
3210 Hudson Yards 40.756658 -74.000111 Uncle Jack's Steakhouse 40.753619 -73.996019 Steakhouse
3211 Hudson Yards 40.756658 -74.000111 Treadwell 40.759964 -73.996284 Restaurant
3212 Hudson Yards 40.756658 -74.000111 Playboy Club New York 40.760000 -73.996367 Lounge
3213 Hudson Yards 40.756658 -74.000111 Cachet Boutique Hotel 40.759773 -73.996460 Hotel
3214 Hudson Yards 40.756658 -74.000111 StarDust 40.759869 -73.996460 Nightclub
3215 Hudson Yards 40.756658 -74.000111 Big George's Smokehouse 40.757954 -74.002296 BBQ Joint
3216 Hudson Yards 40.756658 -74.000111 NY Waterway 42nd St Bus 40.760050 -74.003379 Bus Station
3217 Hudson Yards 40.756658 -74.000111 Twilight Cruise By Citysightseeing 40.759744 -74.004096 Boat or Ferry
3218 Hudson Yards 40.756658 -74.000111 City Lights Cruises 40.759804 -74.004025 Boat or Ferry

Begin analysis of Data

In [10]:
##Create a Dataset of Restaurants not including Mexican Restaurants
Restaurants = manhattan_venues
Restaurants.columns = [column.replace(" ","_") for column in Restaurants]
Restaurants.rename(columns = {'Venue_Latitude':'Latitude', 'Venue_Longitude':'Longitude'}, inplace = True)
Restaurants = Restaurants.query('Venue_Category.str.contains("Restaurant")')

#Drop all Mexican Restaurants
AllRestaurants = Restaurants #(Save All Restaurants Dataset for manipulation later)
Restaurants = Restaurants[Restaurants.Venue_Category != 'Mexican Restaurant']

#Restaurants.head(10) #All non-mexican Restaurants
Restaurants.shape
#AllRestaurants.shape
Out[10]:
(848, 7)

Generating dataset for Mexican Restaurants only.

In [12]:
##Create a Mexican Restaurant sub-Dataset from the Manhattan_Venues Dataset and Visualize it!
mexicanRestaurants = manhattan_venues
mexicanRestaurants.columns = [column.replace(" ","_") for column in mexicanRestaurants]
mexicanRestaurants.rename(columns = {'Venue_Latitude':'Latitude', 'Venue_Longitude':'Longitude'}, inplace = True)
mexicanRestaurants = mexicanRestaurants.query('Venue_Category == "Mexican Restaurant"')

#mexicanRestaurants
mexicanRestaurants.shape
mexicanRestaurants.head()
Out[12]:
Neighborhood Neighborhood_Latitude Neighborhood_Longitude Venue Latitude Longitude Venue_Category
84 Chinatown 40.715618 -73.994279 Factory Tamal 40.715876 -73.990467 Mexican Restaurant
94 Chinatown 40.715618 -73.994279 JaJaJa Plantas Mexicana 40.714198 -73.990157 Mexican Restaurant
172 Washington Heights 40.851903 -73.936900 Refried Beans Mexican Restaurant 40.855039 -73.937031 Mexican Restaurant
214 Inwood 40.867684 -73.921210 Guadalupe Bar and Grill 40.867334 -73.920863 Mexican Restaurant
235 Inwood 40.867684 -73.921210 Guacamole 40.869659 -73.916736 Mexican Restaurant
In [13]:
#This data indicates that there are 55 venues fitting the Category of 'Mexican Restaurant' in NYC (Manhattan)
#Look at data on a MAP (dataset is mexicanRestaurants)
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
#print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

map_newyork = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
#Blue Dots indicate MEXICAN Restaurant 
#Red Dots indicates non-Mexican Restaurant
for lat, lng, venue, neighborhood in zip(mexicanRestaurants['Latitude'], mexicanRestaurants['Longitude'],mexicanRestaurants['Venue'], mexicanRestaurants['Neighborhood']):
    label = '{},{}'.format(venue,neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)
        
for lat, lng, venue, neighborhood in zip(Restaurants['Latitude'], Restaurants['Longitude'], Restaurants['Venue'], Restaurants['Neighborhood']):
    label = '{},{}'.format(venue,neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='red',
        fill=True,
        fill_color=None,
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)
map_newyork
Out[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Hot Encode All Restaurants dataset for analysis later down

In [14]:
##This creates a new dataset that categorizes all the venues
## one hot encoding for Categorical Data
#manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue_Category']], prefix="", prefix_sep="")

## add neighborhood column back to dataframe
#manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

## move neighborhood column to the first column
#fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
#manhattan_onehot = manhattan_onehot[fixed_columns]

#manhattan_onehot.head()
##manhattan_onehot.shape

restaurant_onehot = pd.get_dummies(AllRestaurants[['Venue_Category']], prefix="", prefix_sep="")
restaurant_onehot['Neighborhood'] = AllRestaurants['Neighborhood']
fc = [restaurant_onehot.columns[-1]] + list(restaurant_onehot.columns[:-1])
restaurant_onehot = restaurant_onehot[fc]

restaurant_onehot.shape
restaurant_onehot.head()
Out[14]:
Neighborhood African Restaurant American Restaurant Arepa Restaurant Argentinian Restaurant Asian Restaurant Australian Restaurant Austrian Restaurant Brazilian Restaurant Cajun / Creole Restaurant Cantonese Restaurant Caribbean Restaurant Chinese Restaurant Comfort Food Restaurant Cuban Restaurant Czech Restaurant Dim Sum Restaurant Dumpling Restaurant Eastern European Restaurant Empanada Restaurant Ethiopian Restaurant Falafel Restaurant Fast Food Restaurant Filipino Restaurant French Restaurant German Restaurant Greek Restaurant Hawaiian Restaurant Himalayan Restaurant Hotpot Restaurant Indian Restaurant Israeli Restaurant Italian Restaurant Japanese Curry Restaurant Japanese Restaurant Jewish Restaurant Kebab Restaurant Korean Restaurant Kosher Restaurant Latin American Restaurant Lebanese Restaurant Malay Restaurant Mediterranean Restaurant Mexican Restaurant Middle Eastern Restaurant Modern European Restaurant Molecular Gastronomy Restaurant Moroccan Restaurant New American Restaurant North Indian Restaurant Paella Restaurant Persian Restaurant Peruvian Restaurant Ramen Restaurant Restaurant Russian Restaurant Scandinavian Restaurant Seafood Restaurant Shanghai Restaurant Soba Restaurant South American Restaurant South Indian Restaurant Southern / Soul Food Restaurant Spanish Restaurant Sushi Restaurant Swiss Restaurant Szechuan Restaurant Taiwanese Restaurant Tapas Restaurant Thai Restaurant Theme Restaurant Turkish Restaurant Udon Restaurant Vegetarian / Vegan Restaurant Venezuelan Restaurant Vietnamese Restaurant
12 Marble Hill 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 Chinatown 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
27 Chinatown 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
32 Chinatown 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
36 Chinatown 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Dataset below (mx_only) shows the manhattan neighborhoods that already have a mexican restaurant(s) and as a result, should be excluded from our final consideration based on initial critera.

In [15]:
# Which neighborhoods already have mexican restaurants? We do not want these neighborhoods
mx_only = restaurant_onehot[restaurant_onehot['Mexican Restaurant']>0]
mx_only = mx_only.groupby('Neighborhood').count().reset_index()

col_list = ['Neighborhood','Mexican Restaurant']
mx_only = mx_only[col_list]
mx_only.shape
mx_only.head(15)
Out[15]:
Neighborhood Mexican Restaurant
0 Battery Park City 1
1 Carnegie Hill 1
2 Chinatown 2
3 East Harlem 5
4 East Village 5
5 Financial District 2
6 Flatiron 2
7 Gramercy 3
8 Hamilton Heights 3
9 Inwood 4
10 Lenox Hill 1
11 Lincoln Square 1
12 Little Italy 1
13 Lower East Side 1
14 Manhattan Valley 2

HeatMap below shows most notably that lower westside and westside mid-town are notably absent Mexican restaurants

In [16]:
from folium.plugins import HeatMap
#basemap = generateBaseMap()

heat_map = folium.Map(location=[latitude, longitude], zoom_start=12)
m2 = mexicanRestaurants
m2 = m2[['Latitude','Longitude']]
#m2 = m2.reset_index()
m2.head()

heat_data = [[row['Latitude'],row['Longitude']] for index, row in m2.iterrows()]
HeatMap(heat_data).add_to(heat_map)
heat_map
Out[16]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [17]:
#manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
#manhattan_grouped.head()
##This dataset has the column we are keenly interested in ['Mexican Restaurant']

AllRestaurants_grouped = restaurant_onehot.groupby('Neighborhood').mean().reset_index()
AllRestaurants_grouped.head()
Out[17]:
Neighborhood African Restaurant American Restaurant Arepa Restaurant Argentinian Restaurant Asian Restaurant Australian Restaurant Austrian Restaurant Brazilian Restaurant Cajun / Creole Restaurant Cantonese Restaurant Caribbean Restaurant Chinese Restaurant Comfort Food Restaurant Cuban Restaurant Czech Restaurant Dim Sum Restaurant Dumpling Restaurant Eastern European Restaurant Empanada Restaurant Ethiopian Restaurant Falafel Restaurant Fast Food Restaurant Filipino Restaurant French Restaurant German Restaurant Greek Restaurant Hawaiian Restaurant Himalayan Restaurant Hotpot Restaurant Indian Restaurant Israeli Restaurant Italian Restaurant Japanese Curry Restaurant Japanese Restaurant Jewish Restaurant Kebab Restaurant Korean Restaurant Kosher Restaurant Latin American Restaurant Lebanese Restaurant Malay Restaurant Mediterranean Restaurant Mexican Restaurant Middle Eastern Restaurant Modern European Restaurant Molecular Gastronomy Restaurant Moroccan Restaurant New American Restaurant North Indian Restaurant Paella Restaurant Persian Restaurant Peruvian Restaurant Ramen Restaurant Restaurant Russian Restaurant Scandinavian Restaurant Seafood Restaurant Shanghai Restaurant Soba Restaurant South American Restaurant South Indian Restaurant Southern / Soul Food Restaurant Spanish Restaurant Sushi Restaurant Swiss Restaurant Szechuan Restaurant Taiwanese Restaurant Tapas Restaurant Thai Restaurant Theme Restaurant Turkish Restaurant Udon Restaurant Vegetarian / Vegan Restaurant Venezuelan Restaurant Vietnamese Restaurant
0 Battery Park City 0.0 0.000000 0.0 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.250000 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.250000 0.0 0.250000 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.250000 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000
1 Carnegie Hill 0.0 0.000000 0.0 0.045455 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.045455 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.0 0.045455 0.0 0.136364 0.0 0.000000 0.0 0.0 0.000000 0.090909 0.000000 0.136364 0.0 0.045455 0.0 0.0 0.0 0.045455 0.0 0.0 0.000000 0.045455 0.045455 0.000000 0.0 0.0 0.0 0.045455 0.0 0.000000 0.0 0.0 0.045455 0.045455 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.045455 0.0 0.0 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.045455 0.0 0.090909
2 Central Harlem 0.2 0.133333 0.0 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.066667 0.133333 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.066667 0.0 0.000000 0.0 0.133333 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.0 0.133333 0.000000 0.0 0.0 0.0 0.066667 0.000000 0.000000 0.0 0.0 0.000000 0.066667 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000
3 Chelsea 0.0 0.181818 0.0 0.000000 0.045455 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.045455 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.0 0.000000 0.0 0.090909 0.0 0.000000 0.0 0.0 0.000000 0.045455 0.045455 0.136364 0.0 0.090909 0.0 0.0 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.045455 0.0 0.0 0.0 0.045455 0.0 0.045455 0.0 0.0 0.000000 0.045455 0.0 0.0 0.045455 0.000000 0.0 0.0 0.0 0.000000 0.000000 0.045455 0.0 0.0 0.000000 0.045455 0.000000 0.0 0.0 0.0 0.000000 0.0 0.000000
4 Chinatown 0.0 0.105263 0.0 0.000000 0.052632 0.0 0.026316 0.0 0.0 0.026316 0.000000 0.236842 0.0 0.0 0.0 0.052632 0.026316 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.052632 0.0 0.0 0.105263 0.000000 0.000000 0.026316 0.0 0.000000 0.0 0.0 0.0 0.000000 0.0 0.0 0.052632 0.000000 0.052632 0.000000 0.0 0.0 0.0 0.026316 0.0 0.000000 0.0 0.0 0.000000 0.000000 0.0 0.0 0.000000 0.026316 0.0 0.0 0.0 0.000000 0.026316 0.000000 0.0 0.0 0.026316 0.000000 0.026316 0.0 0.0 0.0 0.026316 0.0 0.026316

Because of simplicity of this endeavor, we would just pull data from neighborhoods where Mexican restaurant category is not in the top 10 venues

In [18]:
#This function sorts venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]
In [19]:
## This analysis may not be necessary just yet
##But Just in case it is useful! DATA is KING!

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
#neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']
neighborhoods_venues_sorted['Neighborhood'] = AllRestaurants_grouped['Neighborhood']

#for ind in np.arange(manhattan_grouped.shape[0]):
#    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

for ind in np.arange(AllRestaurants_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind,1:] = return_most_common_venues(AllRestaurants_grouped.iloc[ind,:], num_top_venues)

neighborhoods_venues_sorted.shape #This sorts restaurants in each neighborhood based on their frequency
neighborhoods_venues_sorted.head()
Out[19]:
Neighborhood 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Battery Park City Japanese Restaurant Italian Restaurant Mexican Restaurant Chinese Restaurant Vietnamese Restaurant Hawaiian Restaurant Fast Food Restaurant Filipino Restaurant French Restaurant German Restaurant
1 Carnegie Hill French Restaurant Italian Restaurant Vietnamese Restaurant Indian Restaurant Restaurant Argentinian Restaurant Chinese Restaurant Fast Food Restaurant Japanese Restaurant Mediterranean Restaurant
2 Central Harlem African Restaurant American Restaurant Seafood Restaurant French Restaurant Chinese Restaurant Southern / Soul Food Restaurant Tapas Restaurant Ethiopian Restaurant Caribbean Restaurant Fast Food Restaurant
3 Chelsea American Restaurant Italian Restaurant French Restaurant Japanese Restaurant Restaurant Middle Eastern Restaurant Chinese Restaurant Paella Restaurant Sushi Restaurant New American Restaurant
4 Chinatown Chinese Restaurant Hotpot Restaurant American Restaurant Malay Restaurant Greek Restaurant Mexican Restaurant Asian Restaurant Dim Sum Restaurant Vietnamese Restaurant Cantonese Restaurant

The data above (neigborhoods_venues_sorted) shows the frequency of restaurants in neighborhoods. Ideal locations for our investigation would only include neighborhoods that do not have mexican restaurants.

Cluster the neighborhoods

In [22]:
# set number of clusters
kclusters = 7

#manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

## run k-means clustering
#kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

## check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 

RestaurantClustering = AllRestaurants_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(RestaurantClustering)
kmeans.labels_[0:10]
Out[22]:
array([3, 6, 1, 1, 3, 6, 1, 3, 3, 3], dtype=int32)
In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) #This only needs to be run once?

#manhattan_merged = manhattan_data
restaurant_merged = AllRestaurants

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
#manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
restaurant_merged = restaurant_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on = 'Neighborhood')

#manhattan_merged.head() # check the last columns!
restaurant_merged.drop(['Neighborhood_Latitude','Neighborhood_Longitude','Venue_Category','Venue'], axis = 1, inplace = True)
#restaurant_merged.head()
#restaurant_merged.shape

#neighborhoods_venues_sorted
#restaurant_merged.head(15)
restaurant_merged.shape
Out[23]:
(903, 14)

Visualization of the Neigborhood Clusters

In [24]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(restaurant_merged['Latitude'], restaurant_merged['Longitude'], restaurant_merged['Neighborhood'], restaurant_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=2,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters
Out[24]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Further Analysis using Visual Tools

In [82]:
#Datasets that can be used for Analysis

#neighborhoods_venues_sorted.head(15) #This table shows the frequency of restaurants in neighborhoods #Note where Mexican Restaurants occur
#AllRestaurants.tail(15) #This table show coordinates for All Restaurants (including Mexican) pulled from FourSquare for Manhattan
#mexicanRestaurants.head()
pd_Restaurant = restaurant_onehot.groupby('Neighborhood').sum()
pd_Restaurant.columns = [column.replace(" ","_") for column in pd_Restaurant]
NoMexicanRestaurant = pd_Restaurant[pd_Restaurant.Mexican_Restaurant == 0]
NoMexicanRestaurant #These neighborhoods have NO MEXICAN RESTAURANTS
#pd_Restaurant.head()
Out[82]:
African_Restaurant American_Restaurant Arepa_Restaurant Argentinian_Restaurant Asian_Restaurant Australian_Restaurant Austrian_Restaurant Brazilian_Restaurant Cajun_/_Creole_Restaurant Cantonese_Restaurant Caribbean_Restaurant Chinese_Restaurant Comfort_Food_Restaurant Cuban_Restaurant Czech_Restaurant Dim_Sum_Restaurant Dumpling_Restaurant Eastern_European_Restaurant Empanada_Restaurant Ethiopian_Restaurant Falafel_Restaurant Fast_Food_Restaurant Filipino_Restaurant French_Restaurant German_Restaurant Greek_Restaurant Hawaiian_Restaurant Himalayan_Restaurant Hotpot_Restaurant Indian_Restaurant Israeli_Restaurant Italian_Restaurant Japanese_Curry_Restaurant Japanese_Restaurant Jewish_Restaurant Kebab_Restaurant Korean_Restaurant Kosher_Restaurant Latin_American_Restaurant Lebanese_Restaurant Malay_Restaurant Mediterranean_Restaurant Mexican_Restaurant Middle_Eastern_Restaurant Modern_European_Restaurant Molecular_Gastronomy_Restaurant Moroccan_Restaurant New_American_Restaurant North_Indian_Restaurant Paella_Restaurant Persian_Restaurant Peruvian_Restaurant Ramen_Restaurant Restaurant Russian_Restaurant Scandinavian_Restaurant Seafood_Restaurant Shanghai_Restaurant Soba_Restaurant South_American_Restaurant South_Indian_Restaurant Southern_/_Soul_Food_Restaurant Spanish_Restaurant Sushi_Restaurant Swiss_Restaurant Szechuan_Restaurant Taiwanese_Restaurant Tapas_Restaurant Thai_Restaurant Theme_Restaurant Turkish_Restaurant Udon_Restaurant Vegetarian_/_Vegan_Restaurant Venezuelan_Restaurant Vietnamese_Restaurant
Neighborhood
Central Harlem 3 2 0 0 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
Chelsea 0 4 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 3 0 2 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
Civic Center 0 3 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 4 0 0 0 0 0 1 0 3 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 1 0 0
Clinton 0 5 0 0 0 0 0 1 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 5 0 0 0 1 1 0 0 0 0 2 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0
Greenwich Village 0 2 0 0 0 0 0 0 0 0 2 3 0 1 0 0 0 1 0 0 1 0 0 2 0 0 0 0 0 3 0 11 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 2 0 0 0 0 0 0 4 0 0 0 1 1 1 0 1 1 0 2
Hudson Yards 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 0
Marble Hill 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Midtown 0 1 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 1 0 0 0 0 0 0 0 0 1
Midtown South 0 3 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 4 0 0 17 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
Roosevelt Island 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Sutton Place 0 2 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 2 0 6 0 0 0 0 0 0 2 1 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 2 0 0 0 2 0 1
Turtle Bay 0 2 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 2 0 0 0 1 0 5 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 3 1 0 0 3 0 1 0 0 0 0 4 0 0 0 0 2 0 2 0 0 0 0
West Village 0 6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 1 0 9 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
In [99]:
import matplotlib.pyplot as plt
In [ ]: